68 research outputs found

    Parallel border tracking in binary images for multicore computers

    Get PDF
    [EN] Border tracking in binary images is an important operation in many computer vision applications. The problem consists in finding borders in a 2D binary image (where all of the pixels are either 0 or 1). There are several algorithms available for this problem, but most of them are sequential. In a former paper, a parallel border tracking algorithm was proposed. This algorithm was designed to run in Graphics Processing units, and it was based on the sequential algorithm known as the Suzuki algorithm. In this paper, we adapt the previously proposed GPU algorithm so that it can be executed in multicore computers. The resulting algorithm is evaluated against its GPU counterpart. The results show that the performance of the GPU algorithm worsens (or even fails) for very large images or images with many borders. On the other hand, the proposed multicore algorithm can efficiently cope with large images.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has been partially supported by the Spanish Ministry of Science, Innovation, and Universities, jointly with the European Union, through Grants RTI2018-098085-BC41, PID2021-125736OB-I00 and PID2020-113656RB-C22 (MCIN/AEI/10.13039/501100011033/, "ERDF A way of making Europe"). Also, the GVA has partially supported this research through project PROMETEO/2019/109.García Mollá, VM.; Alonso-Jordá, P. (2023). Parallel border tracking in binary images for multicore computers. The Journal of Supercomputing. 79:9915-9931. https://doi.org/10.1007/s11227-023-05052-2991599317

    Solving systems of symmetric Toeplitz tridiagonal equations: Rojo's algorithm revisited

    Full text link
    More than 20 years ago, Rojo published [1] an algorithm for solving linear systems where the matrix is tridiagonal symmetric Toeplitz and diagonal dominant. The technique proposed by Rojo is very efficient, O(n), and has been applied successfully in the solution of other similar problems: circulant tridiagonal systems, pentadiagonal Toeplitz systems, etc. In this article we extend Rojo's algorithm to the case of non-diagonal dominant matrices, thus completing a good tool in the aforementioned applications. Other algorithms that solve the same problem are also analysed and compared with the new version of Rojo's algorithm. © 2012 Elsevier Inc. All rights reserved.Supported by Spanish Government (Projects TIN2008-06570-C04 and TEC2009-13741), and Generalitat Valenciana (Project PROMETEO/2009/013).Vidal Maciá, AM.; Alonso-Jordá, P. (2012). Solving systems of symmetric Toeplitz tridiagonal equations: Rojo's algorithm revisited. Applied Mathematics and Computation. 219(4):1874-1889. https://doi.org/10.1016/j.amc.2012.08.03018741889219

    High-performance computing: the essential tool and the essential challenge

    Full text link
    [EN] Prolog to the Journal of Supercomputing, volume 73, issue 1.We would also like to acknowledge to the “Ministerio de Educación y Ciencia” of Spain, for its support to the Spanish CAPAP-H5 network (HPC in Heterogeneous Systems, TIN2014-53522-REDT), and to the “Ministerio de Economía y Competitividad” from Spain/FEDER for supporting Grants TEC2015-67387-C4-1-R and TEC2015-67387-C4-3-R.Alonso-Jordá, P.; Ranilla, J.; Vigo-Aguiar, J. (2017). High-performance computing: the essential tool and the essential challenge. The Journal of Supercomputing. 73(1):1-3. https://doi.org/10.1007/s11227-016-1922-5S1373

    Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

    Full text link
    [EN] Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational system so that efficient executions are obtained independently of the end- user experience. This paper aims to explore programming routines that can automatically be adapted to the computational system conditions thanks to these automatic tuning methodologies. In particular, we have worked on the evaluation of matrix polynomials on multicore and multi-GPU systems as a target application. This application is very useful for the computation of matrix functions like the sine or cosine but, at the same time, the application is very time consuming since the basic computational kernel, which is the matrix multiplication, is carried out many times. The use of all available resources within a node in an easy and efficient way is crucial for the end user.This work has been partially supported by Generalitat Valenciana under Grant PROM-ETEOII/2014/003, and by the Spanish MINECO, as well as European Commission FEDER funds, under Grant TEC2015-67387-C4-1-R and TIN2015-66972-C5-3-R, and network CAPAP-H. Also, we have work in cooperation with the EU-COST Programme Action IC1305, "Network for Sustainable Ultrascale Computing (NESUS)".Boratto, M.; Alonso-Jordá, P.; Gimenez, D.; Lastovetsky, A. (2017). Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems. The Journal of Supercomputing. 73(1):227-239. https://doi.org/10.1007/s11227-016-1694-yS227239731Alberti PV, Alonso P, Vidal AM, Cuenca J, Giménez D (2004) Designing polylibraries to speed up linear algebra computations. IJHPCN 1(1/2/3):75–84Alonso P, Boratto M, Pinilla J, Ibañez J, Martinez J (2014) On the evaluation of matrix polynomials using several GPGPUs. Tech Rep Riunet/E10251/39615Anderson E, Bai Z, Bischof C, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A, Ostrouchov S, Sorensen D (2013) LAPACK users guide, 2nd edn. SIAM, PhiladelphiaBlackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2001) An updated set of basic linear algebra subprograms (blas). ACM Trans Math Softw 28:135–151Caron E, Uter F (2002) Parallel extension of a dynamic performance forecasting tool. Sci Ann Cuza Univ 11:80–93Chandra R (2001) Parallel programming in OpenMP. Morgan Kaufmann, BurlingtonDemmel J, Marques O, Parlett BN, Vömel C (2008) Performance and accuracy of LAPACK’s symmetric tridiagonal eigensolvers. SIAM J.Sci Comput 30(3):1508–1526Frigo M, Johnson S (1998) FFTW: an adaptive software architecture for the FFT. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing vol. 3, pp 1381–1384García L, Cuenca J, Giménez D (2007) Including improvement of the execution time in a software architecture of libraries with self-optimisation. In: ICSOFT 2007, Proceedings of the Second International Conference on Software and Data Technologies, Volume SE, Barcelona, Spain, pp 156–161, 22–25 JulyGarcía LP, Cuenca J, Giménez D (2014) On optimization techniques for the matrix multiplication on hybrid cpu+gpu platforms. Ann Multicore GPU Program 1(1):10–18Hasanov K, Quintin JN, Lastovetsky A (2014) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. J Supercomput 71(11):24–34Katagiri T, Kise K, Honda H (2005) RAO-SS: a prototype of run-time auto-tuning facility for sparse direct solvers. Tech Rep 22(1):1–10Katagiri T, Kise K, Honda H, Yuba T (2004) Effect of auto-tuning with user’s knowledge for numerical software. Proceedings of the 1st conference on computing frontiers, Ischia, Italy. ACM, New York, NY, USA, pp 12–25Nath R, Tomov S, Dongarra J (2010) An improved magma gemm for fermi graphics processing units. Int J High Perform Comput Appl 24(4):511–515Paterson MS, Stockmeyer LJ (1973) On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J Comput 2(1):60–66PLASMA (2015) Parallel linear algebra software for multicore architectures. Available in: http://www.netlib.org/plasma/ . Accessed 1 June 2015Tanaka T, Katagiri T, Yuba T (2007) D-spline based incremental parameter estimation in automatic performance tuning. In: International Conference on Applied Parallel Computing: State of the Art in Scientific Computing, PARA’06. Springer-Verlag, Berlin, Heidelberg, pp 986–995Vuduc R, Demmel J, Bilmes J (2004) Statistical models for empirical search-based performance tuning. Int J High Perform Comput Appl 18:65–94Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Parallel Comput 27:21–3

    Efficient GPU implementation of a Boltzmann‑Schrödinger‑Poisson solver for the simulation of nanoscale DG MOSFETs

    Get PDF
    81–102, 2019) describes an efficient and accurate solver for nanoscale DG MOSFETs through a deterministic Boltzmann-Schrödinger-Poisson model with seven electron–phonon scattering mechanisms on a hybrid parallel CPU/GPU platform. The transport computational phase, i.e. the time integration of the Boltzmann equations, was ported to the GPU using CUDA extensions, but the computation of the system’s eigenstates, i.e. the solution of the Schrödinger-Poisson block, was parallelized only using OpenMP due to its complexity. This work fills the gap by describing a port to GPU for the solver of the Schrödinger-Poisson block. This new proposal implements on GPU a Scheduled Relaxation Jacobi method to solve the sparse linear systems which arise in the 2D Poisson equation. The 1D Schrödinger equation is solved on GPU by adapting a multi-section iteration and the Newton-Raphson algorithm to approximate the energy levels, and the Inverse Power Iterative Method is used to approximate the wave vectors. We want to stress that this solver for the Schrödinger-Poisson block can be thought as a module independent of the transport phase (Boltzmann) and can be used for solvers using different levels of description for the electrons; therefore, it is of particular interest because it can be adapted to other macroscopic, hence faster, solvers for confined devices exploited at industrial level.Project PID2020-117846GB-I00 funded by the Spanish Ministerio de Ciencia e InnovaciónProject A-TIC-344-UGR20 funded by European Regional Development Fund

    La enseñanza de la informática en una escuela técnica de cartografía y topografía (ETSIGCT)

    Get PDF
    La impartición de asignaturas con contenidos de informática se extiende a muchas disciplinas universitarias no informáticas pero que utilizan la informática como herramienta fundamental de su trabajo. El proyecto consta de varias tareas conducentes a la mejora de la docencia en las asignaturas de Informática de esta Escuela Universitaria. Estas tareas pueden resumirse en: adaptar su contenido a las nuevas exigencias, coordinar asignaturas de distinta escuela y departamento con vínculos comunes, mejorar la metodología de enseñanza, introducir nuevos medios docentes y cambiar el sistema de evaluación aprovechando los nuevos medios tecnológicos disponibles y la experiencia en otros proyectos

    Exploring Hybrid Parallel Systems for Probabilistic Record Linkage

    Get PDF
    [EN] Record linkage is a technique widely used to gather data stored in disparate data sources that presumably pertain to the same real world entity. This integration can be done deterministically or probabilistically, depending on the existence of common key attributes among all data sources involved. The probabilistic approach is very time-consuming due to the amount of records that must be compared, specifically in big data scenarios. In this paper, we propose and evaluate a methodology that simultaneously exploits multicore and multi-GPU architectures in order to perform the probabilistic linkage of large-scale Brazilian governmental databases. We present some algorithmic optimizations that provide high accuracy and improve performance by defining the best algorithm-architecture combination for a problem given its input size. We also discuss performance results obtained with different data samples, showing that a hybrid approach outperforms other configurations, providing an average speedup of 7.9 when linking up to 20.000 million records.This work has been partially supported by CNPq, FAPESB, Bill & Melinda Gates Foundation, The Royal Society (UK), Medical Research Council (UK), NVIDIA Hardware Grant Program, Generalitat Valenciana (Grant PROMETEOII/2014/003), Spanish Government and European Commission through TEC2015-67387-C4-1-R (MINECO/FEDER), and network CAPAP-H. We have also worked in cooperation with the EU-COST Programme Action IC1305, "Network for Sustainable Ultrascale Computing (NESUS)Boratto, M.; Alonso-Jordá, P.; Pinto, C.; Melo, P.; Barreto, M.; Denaxas, S. (2019). Exploring Hybrid Parallel Systems for Probabilistic Record Linkage. The Journal of Supercomputing. 75:1137-1149. https://doi.org/10.1007/s11227-018-2328-3S1137114975Andrade G, Viegas F, Ramos GS, Almeida J, Rocha L, Gonçalves M, Ferreira R (2013) GPU-NB: a fast CUDA-based implementation of Naïve Bayes. In: 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp 168–175Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426Cook S (2013) CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann, San FranciscoDoan A, Halevy A, Ives Z (2012) Principles of Data Integration. Elsevier, AmsterdamÉtienne EY (2012) Hyper-threading. TurbsPublishing, SaarbrückenFellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64:1183–1210Feng X, Jin H, Zheng R, Zhu L (2014) Near-duplicate detection using GPU-based simhash scheme. In: 2014 International Conference on Smart Computing, pp 223–228Forchhammer B, Papenbrock T, Stening T, Viehmeier S, Naumann U.D.F (2013) Duplicate detection on GPUs. In: BTW. Köllen-Verlag, pp 165–184Kim H.s, Lee D (2007) Parallel linkage. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007. ACM, New York, NY, USA, pp 283–292Mamun AA, Aseltine R, Rajasekaran S (2015) RLT-S: a web system for record linkage. PLoS ONE 10(5):1–9Mamun AA, Aseltine R, Rajasekaran S (2016) Efficient record linkage algorithms using complete linkage clustering. PLoS ONE 11(4):1–21Mamun AA, Mi T, Aseltine R, Rajasekaran S (2014) Efficient sequential and parallel algorithms for record linkage. J Am Med Inform Assoc 21(2):252–262Mizell E, Biery R (2017) How GPUs are defining the future of data analyticsMunshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011) OpenCL Programming Guide, 1st edn. Addison-Wesley, ReadingNVIDIA Corporation: NVIDIA CUDA C programming guide (2010). Version 3.2OpenMP Architecture Review Board: OpenMP application program interface version 4.0 (2013)Pokorny J (2011) NoSQL databases: a step to database scalability in web environment. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiWAS ’11. ACM, New York, NY, USA, pp 278–283Rendle S, Schmidt-Thieme L (2008) Scaling Record Linkage to Non-uniform Distributed Class Sizes. Springer, Berlin, pp 308–319Sehili Z, Kolb L, Borgs C, Schnell R, Rahm E (2015) Privacy preserving record linkage with ppjoin. In: Datenbanksysteme für Business, Technologie und Web (BTW), pp 85–104Winkler WE (1999) The state of record linkage and current research problemsZhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64(9):2506–251

    Block pivoting implementation of a symmetric Toeplitz solver

    Full text link
    Toeplitz matrices are characterized by a special structure that can be exploited in order to obtain fast linear system solvers. These solvers are difficult to parallelize due to their low computational cost and their closely coupled data operations. We propose to transform the Toeplitz system matrix into a Cauchy-like matrix since the latter can be divided into two independent matrices of half the size of the system matrix and each one of these smaller arising matrices can be factorized efficiently in multicore computers. We use OpenMP and store data in memory by blocks in consecutive positions yielding a simple and efficient algorithm. In addition, by exploiting the fact that diagonal pivoting does not destroy the special structure of Cauchy-like matrices, we introduce a local diagonal pivoting technique which improves the accuracy of the solution and the stability of the algorithm.This work was partially supported by the Spanish Ministerio de Ciencia e Innovacion (Project TIN2008-06570-C04-02 and TEC2009-13741), Vicerrectorado de Investigacion de la Universidad Politecnica de Valencia through PAID-05-10 (ref. 2705), and Generalitat Valenciana through project PROMETEO/2009/2013.Alonso-Jordá, P.; Dolz Zaragozá, MF.; Vidal Maciá, AM. (2014). Block pivoting implementation of a symmetric Toeplitz solver. Journal of Parallel and Distributed Computing. 74(5):2392-2399. https://doi.org/10.1016/j.jpdc.2014.02.003S2392239974

    Two Taylor Algorithms for Computing the Action of the Matrix Exponential on a Vector

    Full text link
    Ibáñez González, JJ.; Alonso Abalos, JM.; Alonso-Jordá, P.; Defez Candel, E.; Sastre, J. (2022). Two Taylor Algorithms for Computing the Action of the Matrix Exponential on a Vector. Algorithms. 15(2):1-48. https://doi.org/10.3390/a1502004814815

    Advances in the Approximation of the Matrix Hyperbolic Tangent

    Full text link
    [EN] In this paper, we introduce two approaches to compute the matrix hyperbolic tangent. While one of them is based on its own definition and uses the matrix exponential, the other one is focused on the expansion of its Taylor series. For this second approximation, we analyse two different alternatives to evaluate the corresponding matrix polynomials. This resulted in three stable and accurate codes, which we implemented in MATLAB and numerically and computationally compared by means of a battery of tests composed of distinct state-of-the-art matrices. Our results show that the Taylor series-based methods were more accurate, although somewhat more computationally expensive, compared with the approach based on the exponential matrix. To avoid this drawback, we propose the use of a set of formulas that allows us to evaluate polynomials in a more efficient way compared with that of the traditional Paterson¿Stockmeyer method, thus, substantially reducing the number of matrix products (practically equal in number to the approach based on the matrix exponential), without penalising the accuracy of the resultThis research was funded by the Spanish Ministerio de Ciencia e Innovacion under grant number TIN2017-89314-P.Ibáñez González, JJ.; Alonso Abalos, JM.; Sastre, J.; Defez Candel, E.; Alonso-Jordá, P. (2021). Advances in the Approximation of the Matrix Hyperbolic Tangent. Mathematics. 9(11):1-20. https://doi.org/10.3390/math911121912091
    • …